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Abstract 

Image hashing is the process of associating a short vector of bits to an image. The resulting sum- 
maries are useful in many applications including image indexing, image authentication and pattern 
recognition. These hashes need to be invariant under transformations of the image that result in 
similar visual content, but should drastically differ for conceptually distinct contents. This paper 
proposes an image hashing method that is invariant under rotation, scaling and translation of the 
image. The gist of our approach relies on the geometric characterization of salient point distribution 
in the image. This is achieved by the definition of a saliency graph connecting these points jointly 
with an image intensity function on the graph nodes. An invariant hash is then obtained by con- 
sidering the spectrum of this function in the eigenvector basis of the Laplacian graph, that is, its 
graph Fourier transform. Interestingly, this spectrum is invariant under any relabeling of the graph 
nodes. The graph reveals geometric information of the image, making the hash robust to image 
transformation, yet distinct for different visual content. The efficiency of the proposed method is 
assessed on a set of MRI 2-D slices and on a database of faces. 
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1 Introduction 

Summarizing images by much shorter sets of bits is of strong interest for many different image processing 
applications. The summaries, or hashes, can be used as content identification to efficiently query images 
in a database. In shape matching, hashes can represent patterns of interest in order to find corresponding 
patterns [3] . Key dependent hashes can also be used to authenticate images and ensure their integrity |10j . 

Image hashing is usually performed in two steps [9 . First, an intermediate hash is produced by 
extracting a representative set of parameters from the image. Second, this intermediate hash is quantized 
by means of vector quantization in order to increase its robustness while reducing its effective bit size. 
These two steps are independent and this paper focuses on the first step, namely the production of an 
intermediate hash. 

One main challenge in image hashing is the robustness of the summary with respect to image transfor- 
mations preserving the visual content. This robustness should be ensured while preserving the ability to 
distinguish distinct visual contents. Different authors have addressed this problem by proposing hashing 
methods based on image features [T31 El HO] ■ In [7] , the hash is produced by locating features points 
and recording their relative coordinates in the orthonormal frame defined by two of them. The operation 
is repeated for all possible pairs of features points. Their approach is robust to global transformations 
and partial occlusion. However, it is limited to relatively simple patterns as they require the storage of 
many coordinates. In [T3], the wavelet transform of the image is computed and each subband is tiled into 
rectangles. The variances or mean value of the intensities is computed for each rectangle and concate- 
nated to produce the intermediate hash. The method presented in [9], uses an iterative region growing 
in the coarse subband of the discrete wavelet transform and simply records the location of the salient 
points as the intermediate hash value. Clearly, these two last methods achieve relatively poor results 
for large image rotation and scaling, since they strongly depend on the order of selection of the features 



points. In [TU], features points are extracted as the locations for which an end-stopped wavelet trans- 
form is maximized. The recorded hash function is then the normalized histogram of the corresponding 
wavelet coefficients. Although the features histogram is more invariant under rotation and scaling, it still 
cannot ensure invariance of the hash values under large rotation. Besides, Kokiopouou et al. [5] have 
recently developed a metric between pattern transformation manifolds and achieved excellent results in 
terms of rotation and scale invariance. However, their approach is not applied for image hashing and 
uses orthogonal matching pursuit which is computationally cumbersome. The lack of robustness under 
large rotation for most common image hashing methods has been recently identified in |14j . The author 
therefore proposes a novel hashing approach whose efficiency does not depend on the rotation angle. His 
approach is based on mean luminance information over image sectors. Although more robust to large 
rotation, his method is not robust under scaling of the image. 

This paper proposes a hashing method that is, by construction, invariant under rotation of any angle 
and under scaling up to interpolation that preserves the significant structures. The presented hash 
function is built in two steps. First, given a simple salient point detector (Sec. 2.1), a smoothed version 
of the Harris corner detector [3], a saliency graph is constructed (Sec. 2.2 1. This structure is a (weighted) 
undirected graph connecting geographically close salient points. Second, the graph Fourier transform 
of a function defined on the graph, that is, its spectrum in the Laplacian graph eigenvector basis, is 
computed (Sec. [3]). The use of this graph Fourier transform makes the hash independent of the salient 
point selection order. Moreover, in order to ensure invariance under transformations of the image, both 
the feature points selection and the definition of the function need to be invariant. A particular attention 
is therefore brought to the invariance of these last two elements. Sec. [4] presents finally the results of 
the method applied on the Brainweb database of brain MM images [5] and on the ORL Database of 
Faces QI]. 



2 Saliency Graph 

Our image hashing method relies on the definition of a saliency graph built from particular salient 
points and from a certain geographical connectivity between them. This graph will be used in Sec. [3] for 
summarizing functions of its node locations in a geometrically consistent way. Hereafter, we first explain 
the method used to detect salient points, and then describe how the graph can be generated from them. 

2.1 Smoothed Harris Corner Detector 

We define our salient points as the intensity corners discovered by a smoothed Harris detector [3J [5] . 
These specific points are indeed preserved under the transformation of interest, that is, under image 
rotation, translation and scaling. Let us describe briefly this method while insisting on the properties of 
interest for our approach. 

The smoothed Harris detector aims at detecting corners on the principle that around these points 
the local intensity gradient strongly varies. Mathematically, given a continuous model I{x) of the image 
intensity at location x — (x,y) £ M 2 , the smoothed Harris corner detector at scales < a < r uses the 
matrix field 

J^ T \x) = [ [VI {a) V T I {a) ]{x') g^(x-x') dV eM M , (1) 

where g^ is the Gaussian kernel of variance a 2 , I^{x) = [I * g^](x) is the smoothed copy of / and 
V stands for the 2-D gradient operator. In other words, since the rank 1 matrix [V/^V I^](x) has 
for eigenvector the gradient VI^^cc) itself, the matrix J^ a,T \x) studies the variability of this vector in 
a neighborhood of x determined by the window g^ T '. In this paper, we arbitrarily set r = 3cr in order 
to have a neighborhood with enough gradient variations, and we give up hereafter the extra parameter 
r in the notations. 

Since the Gaussian kernel is isotropic, J^{x) is invariant under rotation. If I(x) — > I(R g ~ 1 x) for the 
common 2x2 rotation matrix Rg of angle < 9 < 2tt, we show easily that (x) — > Rg (R e ~ 1 x^j Rj. 
In particular, the eigenvalues of j' CT ' remain unchanged under image rotation. Moreover, if the image 
undergoes a rescaling I(x) —> I(x/£) for £ > 0, J^^x) — > c J^T ' (x/£) for some spatially invariant c > 0, 
which links eigenvalues across scales. Under a more realistic discrete model of the image intensity / 



where x is taken on a pixel grid, these invariances remain approximatively true as long as a is larger 
than few multiples of the pixel size. 

The smoothed Harris corner detector proceeds by analyzing the two eigenvalues Ci( x ) < C2( £C ) of 
J^\x). Indeed, on image corners, both eigenvalues are strong and positive [U [5] , while along straight 
edges, ~ Ci < £2- This characterization is observed through the cornerness of /, that is, 

C('\x) = detJ (CT) - K tr(j (CT >) 2 = C1C2 - « (Ci + C2) 2 , 

for some K > (typically set to k = 0.04). Corners are then defined as the local maxima of the cornerness 
(as illustrated on Fig. [T]), that is, 

V (CT) = {x : C (a) (x) is locally maximum}. (2) 

Corner points invariance: The elements of inherit the geometrical invariance of described 
above. This fact is obvious for translation and rotation. For image scaling, if I(x) -> I(x/£), J [a) {x) -> 
cJ^l^ix/t.) for some c> independent of x, and -> £y( CT /«) = {£x : x £ V (ct/ ^} since C^lx) -> 
c 2 C( ff /f)(x/0. 

Size of V' CT ': Generally, the size of is controlled by thresholding small values of in In 
this work we prefer an adaptive formulation where we keep only a fixed number of the strongest local 
maxima in the cornerness. This will be useful latter to control the size of the graph defined from V^ a '. 



Choice of a and scale invariance: In order to define an object-dependent smoothing scale a* , we first 
compute the set \>^ a °^ with a minimal scale do set to few pixels. This first point set is voluntary dense. 
However, we can compute its diameter diam(V'-' To ''), with diam(_4) = max x x i e _^dist(x,x') for any set of 
pixels A C M 2 . If the image contains only one object]]] this diameter is close to the diameter of the object 
itself. Therefore, by setting in a second round the object-dependent scale er* = p diam(V < - cr °- ) ) > ern, for 
some < p < 1, the aforementioned scale invariance of the corner set makes V' CT ' scale invariant^ In 
particular, (diamV < - cr V^ CT ) remains identical if I(x) — > I(x/£). With this procedure in hand and 
setting arbitrarily p = 0.025 for the typical application of Sec.|4j the resulting corner set V^ a ' is simply 
written V. 



2.2 Graph definition 

In order to reveal geometric information of the image /, a graph can be built upon the detected salient 
points. A "Saliency Graph" is therefore defined as the undirected graph Q = Q(I) = (V, W) connecting 
the corner points V = {q : 1 < i < N c } through the definition of the connectivity matrix W £ M. NcXN ' : . 
In other words, given the diameter d* = diam(^) = diam(V) and a certain radius r > defined later, 
the connection between Cj and Cj is weighted by (W)jj (a zero weight meaning no connection) and the 
full matrix reads 




- 2r 2( d « ) 2 || Cj - c j 1 1 2 ) , if i ^ j and ||c 4 - cj < 3rd*, 
else, 



where the value 3 ensures that the exponential is set to if it falls below 1.1% of its peak value. 

This connectivity choice is motivated by the wish to converge towards the true space geometry when 
the number of nodes increases |12j . In particular, since the node set discretizes the planar domain, the 
following graph Laplacian 



A = S-W £ R c c , with = { J2 k Wifc) <5, 



y - 



tends to the continuous planar Laplacian if N c — > 00. Notice that, whatever Q, the vector of ones 1 £ R ° 
is such that Al = 0, that is, 1 is an eigenvector of zero eigenvalue. 

The purpose of the Saliency Graph Q is to capture the distribution geometry of the salient points. 
The definition of the connectivity W is therefore of paramount importance. Interestingly, the radius r 



lr The conclusion describes a possible generalization for images with several objects on a smooth background. 
2 Of course, this holds only for scaling factor compatible with the image sampling. 



weights the impact of the geometry: if r — > +00 or if r — > + , all the nodes are either inter-connected with 
unit weight (complete graph), or fully disconnected (VV = 0). In such limit cases, knowledge about the 
salient point distribution is completely lost. The radius r should therefore be selected carefully between 
these two extreme cases. 

3 Invariant Spectral Hashing 

Spectral Graph theory [T] studies the property of a graph through the spectrum of its Laplacian operator. 
In particular, the iV c Laplacian eigenvectors 

B = {vj G M. Nc :l<j<N c , A Vj = XjVj }, with v t = 1, Ai = 0, Xj < X j+1 , 

constitute an orthonormal basis of M. Nc , that is, a basis any function / G M. Nc: defined on the graph 
nodes. This basis B can be alternatively represented as the matrix B — (v\, • • • , Vm c ) G R NcXN ' : , with 
B^ 1 = B. The graph Laplacian eigenvector basis is the generalization of the Fourier basis. For regular 
distribution of nodes on an infinite plane, B coincides with the 2-D Fourier basis. The Fourier transform 
of a vector / G M. Nc living on Q is therefore naturally defined as 

f = B T f, or f 3 =vjf, Vl<j<N c . 

Interestingly, this Graph Fourier Transform (GFT) is invariant under any relabeling of the graph 
nodes, a useful property since there is no reason why the salient points discovered by the corner detector 
should be ordered similarly between two similar images. Indeed, given a permutation matrix II G 
{0, iy N '= xN " with only one 1 per row and column and II -1 = II, it is easy to show that if the nodes of 
V are permuted accordingly, /->n/,A^nAn T and / -> (HB) T n/ = /. Thanks to this GFT, we 
propose the following image hashing. 

Definition (Invariant Spectral Hashing). Given a certain Saliency Function / G R of I, namely a 
function depending on the salient point locations and on the image intensity I , the Invariant Spectral 
Graph (ISH) of I is the spectrum of f, that is, 

combined with the knowledge of the Saliency Graph Laplacian spectrum {Aj : 1 < i < N c }. 

In this hash, the absolute value (applied component wise on the FT vector) removes the ambiguity on 
the eigenvector orientation^] Consequently, the ISH of I contains information about both salient point 
distribution (through the underlying graph) and image intensity (through the saliency function). 

Saliency function: There exist of course an infinite choice of saliency functions. Given the Saliency 
Graph Q of an image I determined from N c salient points, we focus our approach on this one 

ft = f(ci) = Var{/(a;) : \fx G R 2 , \\x - c t \\ < a} 7 l<i< N c , 

the value a being the smoothed Harris detector radius. In other words, our saliency function / = 
(/lj ' ' ' >/jv c ) T i s interested in the variance of I in a neighborhood of each salient point. Taking the 
variance instead of for instance the mean gives the same impact to all the salient points whatever their 
intensity. What matters here is the variability of I around these, that is, a variation that is linked to the 
corner contrast. 

ISH Complexity: Given an image / of N pixels, the computational complexity of the ISH evaluation 
is split as follows. For the smoothed Harris detector, the complexity is 0(N\ogN) by performing fast 
convolution in the Fourier (FFT) domain. The time consuming part of the graph definition is the con- 
nectivity estimation. This one can be optimized from 0(N 2 ) to 0(N C ) by using a geographical quadtree 
data structure of the nodes. The Laplacian eigenvector/eigenvalue decomposition has a complexity of 
O(N^), with computation time of about 0.01s for N c = 100 on a standard laptop. The saliency function 
is roughly estimated in 0(N c N) computations but it could be optimized with a slight variation of its 
definition (e.g., using the precomputed cornerness). Finally, the GFT of / has complexity 0(kN c ) if it 
is restricted to the k first Fourier coefficients. 

3 Laplacian eigenvector orientation is undetermined since A (±i>) = A (±t>) for any eigenvector v. 



Distance between ISH: In general, for two different images and two different salicncy graphs, the 
two resulting Laplacian eigenvalue systems do not necessarily match. Therefore, in order to develop 
a consistent distance definition, for any image I related to the ISH ip and to the Laplacian spec- 
trum {A l5 • ■ • , Atv c }, we first consider the continuous linear interpolation (p : K — > M. + of the couples 
{(y/Xi , ipi) , ■ ■ ■ ,(y/\N c ,<PN c )} such that (p(\f\i ) = ifi, where the square root enforces the common 
Fourier reading of the spectrum^] Then, for two images I and I', their ISH distance up to the k th 
eigenvalue (1 < k < N c ) is defined as 

r (min(\ k ,\' k )) 1/2 

(2? Sp (/, I')) 2 = / IvgpM - % P HI 2 d«. 

Jo 

Distance between Laplacian spectra: Since Laplacian eigenvalues encode the saliency graph ge- 
ometry [I], it is worth to introduce a distance between two Laplacian spectra. With the notations of the 
previous section this distance reads 

k 

{v A {i,ryf = ]T|a,-a:i 2 . 

i=l 

We will observe in the Sec. [4] that this distance can improve the performance of a characterization by 
2?Sp- Indeed, for similar visual contents, both T>s p and 2? a should be low, and so should be their product 

Ordered hash (OH): Of course, there is another very simple hash defined from any saliency function 
/ = /(/). This is the ordered hash ip old {I) = \f*\ G M+ c , obtained by reordering the values of / in a 
vector /* such that |/*| > |/* +1 | for any 1 < i < N c . The distance between two ordered hashes is then 
simply computed as (D ord (1 7 1')) 2 = ||<£ ord — f' md \\ 2 - As explained later, the ordered hash has a good 
efficiency but it requires to uses all the N c sorted values in order to reach the same results than a ISH 
using only a fraction of the frequencies. 

4 Experiments 

Image hashing pursues two competing goals: robustness and discrimination. In other words, the distance 
between hashes should be low for similar images (whatever the considered transformations) and high for 
different visual contents. Whether two images are similar or not can therefore be decided by comparing 
the distance between their hashes with a threshold value T > 0, that is, given a certain distance T>, two 
images I and /' are characterized as "similar" if !?(/,/') < T {positive test), and different else {negative 
test). 

In this paper, we do not focus on an optimal threshold selection for the distances of interest. We 
rather evaluate the common True Positive (TP), True Negative (TN), False Positive (FP) and False 
Negative (FN) quantities for all possible T. This procedure allows us to estimate (i) the Receiver 
Operating Characteristic (ROC) curves that presents the sensitivity of the test, or True Positive Rate 
(TPR(T) = TP/(TP + FN)), versus the False Positives Rates (FPR(T) = FP/(FP + TN)), and (ii), 
the Area Under the Curve (AUC) of the ROC equals to the probability that a random pair of similar 
images would be assigned a lower distance than this of a random pair of distinct images [2, ■ This AUC 
quantifies the discrimination and robustness performance of the ROC curves. 

Experimental setup: The databases used in our experiments arc a T2-modulation volume MRI cut 
into slices along the xy— directions, from the Brainweb simulator [Sj and the ORL Database of faces [TT] . 
In order to test the ISH, three sets of transformations have been applied on these images: (i) 9 rotations 
of angles between and n, (ii) scalings of factors between 0.8 and 1.2, (in) and 9 random combinations 
of these rotations and scalings. A schematic illustration of the face image manifold (that has a polar 
representation for each image) is shown in Fig. [l] 

4 On the line, a Fourier mode of frequency ui is a Laplacian eigenvector with eigenvalue up . 
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Figure 1: (Left) Corners (gray squares) discovered by the smoothed Harris detector for a brain MRI image. 
(Right) Schematic representation of the image manifold for the faces shown in polar coordinates (a, 9) where a is the 
scaling factor and 6 is the rotation angle (Brain image manifold can be represented similarly). 



For each image, the number of extracted salient points was se 10 to N c = 100 maximum, the Gaussian 
kernel used for saliency detection has a standard deviation a of 2.5% of the graph diameter d* . For the 
value of the connectivity radius r (Sec. 2.2), good results have been obtained if r = 1/15. With this 
value, each node in the resulting saliency graphs were connected to an average of 5 other nodes. 



Results and discussions: The ROC curves testing rotation invariance, scaling invariance, and mixed 
rotation and scaling invariance have been computed for the two databases and for X> or d, £>Sp and T>s p T>a- 
For these two last distances, the ROC curves have been obtained by keeping only the k = 10 first 
eigenvalues and GFT coefficients. The ROC curves testing mixed rotation and scaling are shown in 
Fig. [2] for the two databases. All the related AUCs are summarized in Fig. [3j 

For the brain MRI database, T>s p achieves sensitivities over 90% with false positives rates lower than 
10%. Under rotation only, a sensitivity of 95% with a false positives rate of 8% is achieved. Results for 
the ORL Database of Faces were slightly worse due to the lower number of salient points detected. For 
all faces, the maximum possible number of salient points, i.e., all the local maxima of the cornerness 
function, was systematically lower than the imposed maximum of N c = 100. The hashing was therefore 
more sensitive to the variations of salient point positions between different transformation of the same 
image. 




Figure 2: ROC curves for mixed image rotation and scaling: (left) brain database, (right) faces database. The "— ■ — ", 
dashed and continuous curves show the robustness when £> or d, X>s p and T>g p T> i \ are used, respectively. 

For the brain database, an interesting result is that the distance between brain slices that are phys- 
ically close is shorter than the distance between slices wide apart in the brain. In other words, if we 



5 When the number of salient points was smaller than N c , the distances T>g p , ford, and T>^ have been computed 
relatively to the smallest hash size. 
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Figure 3: (Left) AUC table for the different transformations of the images using distance between ordered hashing (Cord)) 
distance between spectral hashing (X>s p ), and spectral hashing combined with spectra comparison {T>s p T>/\). (Right) Area 
under the ROC curve for increasing number of ISH coefficients and eigenvalues. Most information is contained in the 10 
first coefficients after what the AUC remains mostly constant. 




Figure 4: (Left) The expectation of the distance between slices of the brain database is an increasing function of the actual 
physical distance between slices. (Right) Six contiguous slices of the brain. The visual content does not change much from 
slice to slice. A suitable image metric should therefore yield low distances between them. 



consider the brain MM as a volumetric image I z (x) = I(x,y,z) for which each slice in the database is 
the result of fixing z to some value, then the expectation of the distance T>s p between two slices separated 
along the z— axis by a distance S z > 0, 

m d {S z ) =E{V Sp {I z ,I z+s ) :5 = ±5 z ,zeR}, (3) 

is an increasing function of 5 Z for S z sufficiently close to zero. Fig. [4] shows the evolution of md(5 z ) (with 
two dashed curves providing the 99% confidence interval on estimation) computed over the 100 brain 
slices rotated and scaled. The expectation of the distance is indeed increasing for 8 Z < 10. This means 
that the distance £>sp between ISH truly reflects the difference between visual contents. The MRI was 
indeed taken with a z— resolution of 1mm for which visual contents of contiguous slices are very close, 
as depicted in Fig. [4] The false positive pairs of images are therefore more likely to be adjacent slides 
which are visually close than totally different slides. 

In order to validate the performance of the ISH compared to the naive ordered hashing (OR) ordering 
the N c values of the saliency function, all the ROC curves were computed using only the k = 10 first 
GFT coefficients and the 10 first Laplacian eigenvalues. Therefore, the hash lengths related to the use of 
T>s p and T>s p T>a are both equal to 20, that is, 20% of the tested OR hash length. Results show that the 
spectral hashing with less coefficients performs as good or better than the naive ordering hashing. It is 
interesting to quantify the gain in discrimination when the number k of GFT coefficients and eigenvalues 
increases in the spectral hashing. This can be evaluated by computing the AUC for an increasing number 
of coefficients. This evolution is depicted in Fig. [3] for both the spectral hashing and the combination of 



the spectral hashing with the spectral comparison. As a result, the performance does not increase much 
when more than 2x10 coefficients are retained. The spectral hashing is therefore capable to extract the 
information useful to discriminate between different visual contents in fewer coefficients. 

5 Conclusion 

This paper has shown that the geometry of salient point distribution can advantageously be considered in 
order to form an invariant image hashing. This geometrical inclusion is achieved through the Laplacian 
spectrum of a Saliency Graph built by connecting geographically close salient points. In consequence, the 
associated Graph Fourier Transform of some saliency function, that can be improved with the Laplacian 
eigenvalue distribution, provides a robust and discriminant image hashing. Moreover, compared to the 
ordered hashing where the knowledge of the salient point distribution is lost, the Invariant Spectral 
Hashing requires much less values for the same efficiency. In a future research, the impact of the 
connectivity parameters (like the radius r) on the classification procedure will be assessed, together with 
a careful study of different quantization strategies (e.g., scalar quantization of the different spectra). We 
expect also to achieve a characterization of images made of several distinct objects arranged on a smooth 
background. The saliency graph can indeed serve to partition the image thanks to the structure of the 
first Laplacian eigenvectors (like the zero crossing paths). 
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